Improved techniques for the identification of pseudogenes
نویسندگان
چکیده
MOTIVATION Pseudogenes are the remnants of genomic sequences of genes which are no longer functional. They are frequent in most eukaryotic genomes, and an important resource for comparative genomics. However, pseudogenes are often mis-annotated as functional genes in sequence databases. Current methods for identifying pseudogenes include methods which rely on the presence of stop codons and frameshifts, as well as methods based on the ratio of non-silent to silent nucleotide substitution rates (dN/dS). A recent survey concluded that 50% of human pseudogenes have no detectable truncation in their pseudo-coding regions, indicating that the former methods lack sensitivity. The latter methods have been used to find sets of genes enriched for pseudogenes, but are not specific enough to accurately separate pseudogenes from expressed genes. RESULTS We introduce a program called pseudogene inference from loss of constraint (PSILC) which incorporates novel methods for separating pseudogenes from functional genes. The methods calculate the log-odds score that evolution along the final branch of the gene tree to the query gene has been according to the following constraints: A neutral nucleotide model compared to a Pfam domain encoding model (PSILC(nuc/dom)); A protein coding model compared to a Pfam domain encoding model (PSILC(prot/dom)). Using the manual annotation of human chromosome 6, we show that both these methods result in a more accurate classification of pseudogenes than dN/dS when a Pfam domain alignment is available. AVAILABILITY PSILC is available from http://www.sanger.ac.uk/Software/PSILC
منابع مشابه
IIR System Identification Using Improved Harmony Search Algorithm with Chaos
Due to the fact that the error surface of adaptive infinite impulse response (IIR) systems is generally nonlinear and multimodal, the conventional derivative based techniques fail when used in adaptive identification of such systems. In this case, global optimization techniques are required in order to avoid the local minima. Harmony search (HS), a musical inspired metaheuristic, is a recently ...
متن کاملMolecular Methods for Bacterial Strain Typing
ABSTRACT Typing of bacteria is an important part of epidemiological studies on nosocomial infections. Bacterial identification methods have dramatically improved in recent years, which is mainly due to advancements in the field of molecular biotechnology. In many cases, molecular techniques have replaced phenotypic typing methods. Currently, a wide r...
متن کاملBehavioral Analysis of Traffic Flow for an Effective Network Traffic Identification
Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...
متن کاملSystematic identification of pseudogenes through whole genome expression evidence profiling
The identification of pseudogenes is an integral and significant part of the genome annotation because of their abundance and their impact on the experimental analysis of functional genes. Most of the computational annotation systems are not optimized for systematic pseudogene recognition, often annotating pseudogenes as functional genes, and users then propagate these errors to subsequent anal...
متن کاملارزیابی اثر بخشی روش های تشخیص برای شناسایی خطرهای موجود در صنعت
Background and Aim: The first step in establishing a safety system is hazard identification. If this is not done properly, the subsequent steps steps will not be done effectively either. Since any given identification technique often targets the hazards of one or two of the main elements of a safety system, it is not possible to identify all hazards by a single technique Materials and Methods...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 Suppl 1 شماره
صفحات -
تاریخ انتشار 2004